Qualcomm AI Engine Direct - heap profiling at runtime on target by jethroqti · Pull Request #19716 · pytorch/executorch

jethroqti · 2026-05-21T07:37:26Z

Qualcomm AI Engine Direct - heap profiling at runtime on target

Summary:
Heap profiling at runtime with HTP backend on Android platforms. DSP
heap profiling is available for QnnContext_createFromBinary use-cases.
It captures total DSP heap usage at two checkpoints:
- Before the first context is created (before_context_created)
- After the last context is freed (after_context_freed)

The difference between the two values represents heap consumed during
context execution. The value after freeing is typically equal to or
greater than before creation.

Test plan:
python backends/qualcomm/tests/test_qnn_delegate.py
TestQNNQuantizedUtils.test_qnn_backend_runtime_option_heap_profile -b
build-android -H ${HOST} -s ${SN} -m ${SOC_MODEL}

Note:
This test is expected to run on target device.

pytorch-bot · 2026-05-21T07:37:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19716

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jethroqti · 2026-05-21T07:38:00Z

@pytorchbot label "release notes: qualcomm"

jethroqti · 2026-05-21T07:45:11Z

@psiddh
Already added protection, based on #19224, when test on x86 machine. Please take a look.
Sorry for your inconvenience.

@shewu-quic

psiddh · 2026-05-26T08:47:35Z

Can you rebase your PR please?

psiddh · 2026-05-26T08:48:23Z

@claude review this PR

claude · 2026-05-26T08:49:14Z

Claude finished @psiddh's task in 3m 29s —— View job

Code Review: Qualcomm AI Engine Direct - Heap Profiling at Runtime on Target

Read changed files and understand the implementation
Review runtime C++ changes
Review Python/test changes
Provide review feedback

Summary

This PR adds DSP heap profiling for the QNN HTP backend at runtime. It captures heap usage at two checkpoints (before first context creation, after last context freed) and writes them to a file. The implementation uses static state to coordinate across multiple contexts.

Issues & Suggestions

1. Race condition: `WriteHeapProfile` reads runtime option without lock

QnnContextCommon.cpp:20-38 — WriteHeapProfile() calls get_runtime_option(QNN_RUNTIME_HEAP_PROFILING_PATH, ...) which internally calls get_option(QNN_BACKEND, ...) on the backend's get_option. The backend's get_option does hold runtime_option_mutex_, so this specific access is safe. However, WriteHeapProfile is called from both Configure() and ~QnnContext() after the mutex is released. If a concurrent set_option changes the path between the profile-decision and the WriteHeapProfile call, you'd write to a different file than expected. This is likely low-risk in practice but worth documenting.

2. Potential issue: Profile handle ownership and lifetime

In QnnContextCommon.cpp:55-56, the destructor passes qnn_profiler_->GetHandle() to qnn_context_free. Then in QnnProfile::~QnnProfile(), the profile handle is freed via qnn_profile_free. This ordering relies on QnnContext destroying qnn_profiler_ (unique_ptr) after the context free call — which is guaranteed by C++ destruction order (member destructors run after the destructor body). This is correct but subtle.

3. Static state not reset across program runs

QnnContextCommon.cpp:17-18:

std::mutex QnnContext::htp_context_mutex_;
int QnnContext::htp_context_count_{0};

The static htp_context_count_ is never explicitly reset. If the program creates and destroys contexts, then creates new ones, the count would be 0 again (since destructors decrement it). This works correctly. However, if a context creation fails after the count was incremented (the rollback logic handles this), and then subsequent contexts are created, the "first context" profiling would not trigger. The rollback at line 120-126 handles this case correctly.

4. File truncation logic could lose data in edge cases

QnnProfiler.cpp:233-250 — The file open mode logic:

before_context_created → std::ios::trunc (truncates file)
after_context_freed → std::ios::app (appends)

This assumes events always arrive in order (before then after). If for any reason the QNN SDK returns events out-of-order, the file would contain incorrect data. Consider adding a check or comment that this ordering is guaranteed by the SDK.

Additionally, the file is opened and closed separately for each event write (lines 244-250). Opening with trunc for the first event and app for the second is reasonable, but opening the file once and writing both would be more efficient and atomic.

5. Missing `runtime_option_mutex_` in `set_option` (pre-existing, but relevant)

Looking at QnnExecuTorchBackend.cpp:205, set_option does hold runtime_option_mutex_. Good — this is consistent with the newly added get_option lock at line 278.

6. Test validation could be more robust

backends/qualcomm/tests/utils.py:415-435 — The validate_heap_profile function:

values = [int(line.split(",")[1].strip()) for line in f if line.strip()]

This parsing is fragile. If the format ever changes (e.g., extra comma in the identifier), it would silently parse wrong values. Consider splitting on the last comma, or validating the identifier prefix matches expectations.

Also, difference is computed as after - before and printed as f"{difference:.2f} bytes", but difference is an int (since both values are parsed with int()). The .2f format specifier is misleading — this will always show .00.

7. `BackendOptions<9>` — magic number

qnn_executor_runner.cpp:223 — Bumping from <8> to <9> is correct for adding one more option, but this magic number will need updating each time a new option is added. A comment or constant would help maintainability, though this follows the existing pattern.

8. Multi-line comment in `QnnContextCommon.cpp`

Lines 85-95 contain a multi-line block comment explaining the heap profiling logic. Per the codebase style guidelines in CLAUDE.md, minimal comments are preferred and code should be self-documenting. However, since this documents a QNN SDK constraint that isn't obvious from the code, it's acceptable — but could be trimmed to 2-3 lines.

Minor Nits

backends/qualcomm/debugger/README.md:81 — Good fix for the trailing space in .bin .
QnnProfiler.cpp:27-28 — The #define macros (DEFINE_HEAP_BEFORE_CREATION, DEFINE_HEAP_AFTER_FREED) would be better as constexpr string literals in the anonymous namespace, matching modern C++ style. Defines leak into all translation units that include headers.
HtpContextCustomConfig.h:38-41 — The [[maybe_unused]] approach is pragmatic; the comment explaining it is helpful.

Overall Assessment

The implementation is well-structured with proper mutex protection for the static context counter, correct rollback on failure, and appropriate x86_64 test guards. The code follows existing patterns in the QNN backend. The main areas for improvement are the file I/O pattern in ProfileDataToFile (opening file twice) and the test parsing robustness.

Verdict: Looks good overall. The thread-safety model is sound, the feature is appropriately guarded behind profile level checks, and tests cover both quantized and floating-point paths. Minor improvements suggested above but nothing blocking.

jethroqti · 2026-05-26T08:52:08Z

Can you rebase your PR please?

Sure.
@psiddh Finished the rebase.

Summary: Heap profiling at runtime with HTP backend on Android platforms. DSP heap profiling is available for QnnContext_createFromBinary use-cases. It captures total DSP heap usage at two checkpoints: - Before the first context is created (before_context_created) - After the last context is freed (after_context_freed) The difference between the two values represents heap consumed during context execution. The value after freeing is typically equal to or greater than before creation. Test plan: python backends/qualcomm/tests/test_qnn_delegate.py TestQNNQuantizedUtils.test_qnn_backend_runtime_option_heap_profile -b build-android -H ${HOST} -s ${SN} -m ${SOC_MODEL} Note: This test is expected to run on target device.

linux-foundation-easycla · 2026-05-26T13:38:56Z

❌ - login: @jethroqti / name: jethroqti. The commit (7a83ebe) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please visit our EasyCLA portal and chat with our support bot.

jethroqti · 2026-05-26T13:39:26Z

@pytorchbot label "release notes: qualcomm"

jethroqti requested review from abhinaykukkadapu and psiddh as code owners May 21, 2026 07:37

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 21, 2026

pytorch-bot Bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label May 21, 2026

jethroqti force-pushed the dev1/memory/profiling_rt branch from eff7a0f to 7a83ebe Compare May 26, 2026 13:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qualcomm AI Engine Direct - heap profiling at runtime on target#19716

Qualcomm AI Engine Direct - heap profiling at runtime on target#19716
jethroqti wants to merge 1 commit into
pytorch:mainfrom
CodeLinaro:dev1/memory/profiling_rt

jethroqti commented May 21, 2026

Uh oh!

pytorch-bot Bot commented May 21, 2026 •

edited

Loading

Uh oh!

jethroqti commented May 21, 2026

Uh oh!

jethroqti commented May 21, 2026

Uh oh!

psiddh commented May 26, 2026 •

edited

Loading

Uh oh!

psiddh commented May 26, 2026

Uh oh!

claude Bot commented May 26, 2026 •

edited

Loading

Uh oh!

jethroqti commented May 26, 2026 •

edited

Loading

Uh oh!

linux-foundation-easycla Bot commented May 26, 2026

Uh oh!

jethroqti commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jethroqti commented May 21, 2026

Uh oh!

pytorch-bot Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19716

Uh oh!

jethroqti commented May 21, 2026

Uh oh!

jethroqti commented May 21, 2026

Uh oh!

psiddh commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

psiddh commented May 26, 2026

Uh oh!

claude Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Qualcomm AI Engine Direct - Heap Profiling at Runtime on Target

Summary

Issues & Suggestions

1. Race condition: WriteHeapProfile reads runtime option without lock

2. Potential issue: Profile handle ownership and lifetime

3. Static state not reset across program runs

4. File truncation logic could lose data in edge cases

5. Missing runtime_option_mutex_ in set_option (pre-existing, but relevant)

6. Test validation could be more robust

7. BackendOptions<9> — magic number

8. Multi-line comment in QnnContextCommon.cpp

Minor Nits

Overall Assessment

Uh oh!

jethroqti commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linux-foundation-easycla Bot commented May 26, 2026

Uh oh!

jethroqti commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot Bot commented May 21, 2026 •

edited

Loading

psiddh commented May 26, 2026 •

edited

Loading

claude Bot commented May 26, 2026 •

edited

Loading

1. Race condition: `WriteHeapProfile` reads runtime option without lock

5. Missing `runtime_option_mutex_` in `set_option` (pre-existing, but relevant)

7. `BackendOptions<9>` — magic number

8. Multi-line comment in `QnnContextCommon.cpp`

jethroqti commented May 26, 2026 •

edited

Loading